115 research outputs found

    Research Interests Databases

    Get PDF

    Discovering Clusters in Motion Time-Series Data

    Full text link
    A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.Office of Naval Research (N000140310108, N000140110444); National Science Foundation (IIS-0208876, CAREER Award 0133825

    Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

    Full text link
    Consensus clustering (or clustering aggregation) inputs kk partitions of a given ground set VV, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either kk or VV gets large. In this paper we provide practical run time improvements for correlation clustering solvers when VV is large. We reduce the time complexity of Pivot from O(∣V∣2k)O(|V|^2 k) to O(∣V∣k)O(|V| k), and its space complexity from O(∣V∣2)O(|V|^2) to O(∣V∣k)O(|V| k) -- a significant savings since in practice kk is much less than ∣V∣|V|. We also analyze a sampling method for these algorithms when kk is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions

    A Comparative Evaluation of Order-Revealing Encryption Schemes and Secure Range-Query Protocols

    Get PDF
    Database query evaluation over encrypted data can allow database users to maintain the privacy of their data while outsourcing data processing. Order-Preserving Encryption (OPE) and Order-Revealing Encryption (ORE) were designed to enable efficient query execution, but provide only partial privacy. More private protocols, based on Searchable Symmetric Encryption (SSE), Oblivious RAM (ORAM) or custom encrypted data structures, have also been designed. In this paper, we develop a framework to provide the first comprehensive comparison among a number of range query protocols that ensure varying levels of privacy of user data. We evaluate five ORE-based and five generic range query protocols. We analyze and compare them both theoretically and experimentally and measure their performance over database indexing and query evaluation. We report not only execution time but also I/O performance, communication amount, and usage of cryptographic primitive operations. Our comparison reveals some interesting insights concerning the relative security and performance of these approaches in database settings

    Generalized Methods for Discovering Frequent Poly-Regions in DNA

    Full text link
    The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning
    • …
    corecore